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Abstract 


A simple score test of the normal two-parameter logistic (2PL) model is presented that examines 
the potential attraction of the normal three-parameter logistic (3PL) model for use with a 

TM 

particular item. Application is made to data from a test from the Praxis series. Results from 
this example raise the question whether the normal 3PL model should be used routinely in 
preference to the normal 2PL model unless evidence exists that a substantial gain in description 
of data is achieved. 
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A simple variation on the traditional score test (Rao, 1973, p. 418) can be derived to check if 
the three-parameter logistic (3PL) model is an attractive alternative to the two-parameter logistic 
(2PL) model without actually fitting a 3PL model. This test is derived in Section 1.. In Section 2., 

TM 

its use is considered for data from the Praxis series of examinations. Implications of results for 
psychometric practice are considered in Section 3.. Although the specific application considered 
here does not appear to be readily found in the literature, similar attempts at model diagnosis 
have been employed in the past to detect other departures from the 2PL model (Glas, 1999). 

Throughout this report, n > 1 examinees each take a test with q > 3 items, a random variable 
Xij is 1 if item j is answered correctly by examinee i, and Xij is 0 if item j is not answered 
correctly. Each vector X; of responses Xij, 1 < j < q. is independent and identically distributed. 
The set T of possible values of Xj consists of all (/-dimensional vectors such that each coordinate 
is 0 or 1. The distribution of X is characterized by the array p of probabilities 

p(x) = -P(X; = x) 

for x in T, so that p is in the simplex T of arrays r with nonnegative elements r(x), x in T, with a 
sum of 1. The log likelihood function at r in T is then 

n 

Z(r) = ^logr(X ?: ), 

i =1 

and 

H{ r) = —(nq)~ 1 £(r) 

estimates the expected log penalty per item 

F(r) = -g- 1 E(logr(X 1 )) 

from probability prediction of Xi by use of r. For a nonempty subset S of T, the maximum log 
likelihood £(S) of £(r) for r in S then leads to the minimum estimated expected log penalty per 
item H(S) = ~(nq)~ 1 £(S) of H( r) for r in S (Gilula & Haberman, 1994, 1995). Here H(S) is an 
estimate of the minimum expected log penalty per item H(S) of H( r) for r in S. A member p of 
S is a maximum-likelihood estimate of the probability array p (relative to S ) if £(p) = £(S). 

In both the 2PL and 3PL models (Bock & Aitkin, 1981; Bock & Lieberman, 1970; Hambleton, 
Swaminathan, & Rogers, 1991), a random ability variable 9i is associated with each examinee i, 
and the Xij, 1 < j < q, are conditionally independent given 9i. The pairs (0*,X,;) are independent 
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and identically distributed, and the distribution function of 9{ is D. In this report, the simple case 
will be considered in which D is assumed equal to the standard normal distribution function $. 
For each item j, the conditional probability Pj(9) that X t j = 1 given Oi = 9 is positive and less 
than 1 , so that Qj(9 ) = 1 — Pj(0) is also positive and less than 1 . The function Pj is the item 
characteristic curve, and 

Xj = log (Pj/Qj) 


is the item logit function (Holland, 1990), so that 

exp(Xj) 


and 


Pj = 


Qj — 


1 + exp(Aj) 


1 

1 + exp(Aj)' 


Let A have coordinates X j for 1 < j < q, and let 


llV = ^2 UjV 


' 3 U 3 


3 =1 


for g-dinrensional vectors u and v with respective coordinates Uj and Vj for 1 < j < q. For 


r-n^-n^e^.). 


a variation on the Dutch identity yields 


(1) 

( 2 ) 


(3) 


p(x) = y V exp(X(A )dD (4) 

(Holland, 1990). 

The set S 2 n that corresponds to the normal 2PL model consists of all arrays p in S such that 
(3) and (4) hold, D = $, and 

A(0) = 0a- 7 (5) 


for some ^-dimensional vectors a and 7 with respective coordinates a.j > 0 and 7 j for 1 < j < q. 
For item j, the item discrimination is ctj, and the item difficulty is 7 j/aj. The set Ss n for the 
normal 3PL model consists of all arrays p in S such that (3) and (4) hold, D = <f>, and 


A j(0) = log{[cj + exp (cij9 - 7 i )]/(l - cj)} 


( 6 ) 
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for some real aj > 0, Cj in [0,1), and 7 j. The 3PL case reduces to the 2PL case if each Cj = 0. 
The cij and 7 j can be interpreted as in the 2PL model, and Cj is a guessing probability. In the 
construction of the desired test statistic, the restriction set S^nk is considered for 1 < k < q in 
which p in Sz n is in S 3n fc if (3), (4), and ( 6 ) hold for D = $ and for some aj > 0, Cj > 0 in 
[0,1), and 7 j, 1 < j < q, such that Cj = 0 if j ^ k. For use in comparison of estimated expected 
penalties, it is also helpful to note that the set S\ n for the normal Rasch model consists of p in 
S 2 n such that (3) and(4) hold, D = $, and (5) holds for some (/-dimensional vectors a and 7 with 
respective coordinates aj > 0 and 7 j for 1 < j < q and aj = a\ for j > 1 . 

1. The Test Statistic 

To construct the desired score test statistic, consider an item k from 1 to q. Consider the null 
hypothesis that the probability array p is in S 2 n > so that the normal 2PL model holds, against the 
alternative that p is in S-^ n k. Let a and 7 be the respective maximum-likelihood estimates of the 
vectors a and 7 under the 2PL model. For 1 < j < q. let aj be coordinate j of a, and let 7 j be 
coordinate j of 7 . To construct the test, consider the 3g-dimensional vector r with coordinates 
Tj = aj > 0, T q+ j = 7 j, and T 2 q +j = Cj in [0,1) for 1 < j < q. Let p x be the array in 5 3n such that 
(3), (4), and (6) hold for p = p x , and let H(t) = £(p T ). Let 

hi(r) = log Px(Xj), 


so that 

n 

h ( t ) =^hi{T). 

i= 1 

The test statistic requires partial derivatives of H. Let h tJ (T) denote the partial derivative of 
hi at r with respect to Tj, and let Hj{r) denote the partial derivative of H at r with respect to 
Tj, so that 

n 

Hj = ^ hij. 

i= 1 

Let t* be the 3c/-dimensional vector with coordinates t* = aj, t* +1 = 7 j, and r| 9+J - = 0 for 
1 < j < q. For item j, the score test statistic is Uj = H 2 q +j(f*). To evaluate Uj, let 

A( 0 ) = 0 a -7 
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be the maximum-likelihood estimate of A (0) under the 2PL model. Let A j be coordinate j of A, 
and let 



-i 


3 =1 


be the nraxinrunr-likelihood estimate of V for the 2PL model. Let 


Pj = I 1 + exp(Aj)] 1 


be the maximum-likelihood estimate of Pj under the 2PL model. Use of the chain rule of 
differentiation and use of standard properties of exponential families (Berk, 1972) shows that 


Pj— n y/ Pij ! 

i= 1 


where 

LJ = fPf 1 (X i j-P j )ex p(X; : A)U^ 

lJ I exp(X'A)W/> 

Comparison of the standard asymptotic variance formula for n 1//2 Uj (Aitchison & Silvey, 
1958) with standard regression formulas (Rao, 1973, pp. 267-268) shows that the asymptotic 
variance a 2 of n x l 2 Uj is the same as the mean-squared error from linear prediction of h t j 2q +f) ( T ) 
by ^ifc( T )) 1 < k < 2 q. Differentiation shows that, under the 2PL model with cj = 0 for 1 < j < q, 

, f(Xij — Pj) exp(X'A)U0 

lA) /ex p(X'A)U0 


and 


h q +j(r) 


J 0(X l3 - P 3 ) exp(X'A )V(j> 
/exp(X'A)D</> 


for 1 < j < q. 

It is a straightforward matter to verify that a 2 is consistently estimated by the residual 
mean-squared error s 2 from linear regression of onto hik = hik(T*) for 1 < k < 2 q, where 
1 < i < n. The desired statistic for item j is then tj = n}/ 2 Uj/sj. The statistic tj has 
an approximate standard normal distribution under the 2PL model, with the approximation 
increasingly accurate as the sample size becomes large. If p x is in S^nj for some item j and if Cj is 
small, then £(S^ n j) is well-approximated by l(S 2 n ) + t 2 / 2 and Hi^S^nj ) is well approximated by 
H{S 2n ) + t 2 j /(2nq). 
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2. Application to Data 

In the example under study, n = 8,686 and q = 45. The test statistics for each item are 
shown in Table 1. Despite a substantial sample size, many items have score statistics compatible 
with the normal 2PL model. For example, \tk\ < 2 in 15 cases. On the other hand, items not 
compatible with the 2PL model are readily found, for 26 items have tk greater than 2, and 4 items 
have tk less than -2. Even with allowances for multiple comparisons, 10 standardized values that 
exceed 4 are very unlikely to occur by chance if the model is true. 

It should be emphasized that the test statistics do not imply that the 3PL model provides a 
description of the data that is much better than the description provided by the 2PL model. For 
some perspective on this point, consider some estimated log-penalty functions that can be derived 
for the data under study. The estimate H(S 2 n ) for the normal 2PL model is 0.59157, while H^S^n) 
for the normal 3PL model is 0.59074. This gain of 0.00083 is quite modest. For comparison, the 
minimum estimated expected penalty per item for the normal 1PL model is 0.59639, so that the 
gain from use of the normal 2PL rather than the normal one-parameter logistic (1PL) model is 
0.00482. The estimated expected log penalty under the trivial model that all Xij, 1 < j < q, are 
independent is 0.62467, so that the gain for the normal 1PL model over the independence model is 
0.02828, a much larger gain than the gain from the normal 1PL model to the normal 2PL model. 

Table 1 

Results of Tests for Nonzero Guessing Probabilities 


Item 

k 

Score 

average 

u k 

Standard 

error 

Standardized 

value 

Ik 

1 

0.00043 

0.00054 

0.80478 

2 

0.00132 

0.00020 

6.50359 

3 

0.00073 

0.00189 

0.38535 

4 

-0.00281 

0.00128 

-2.18717 

5 

0.00780 

0.00271 

2.87903 

6 

0.00086 

0.00052 

1.66437 

7 

0.00071 

0.00026 

2.73552 

8 

0.01491 

0.00258 

5.77798 

9 

0.01644 

0.00279 

5.88547 

10 

-0.00106 

0.00076 

-1.39806 

11 

0.00102 

0.00034 

2.99920 


(Table continues) 
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Table 1 (continued) 


Item 

k 

Score 

average 

u k 

Standard 

error 

Sk 

Standardized 

value 

t k 

12 

0.01944 

0.00408 

4.76351 

13 

0.00463 

0.00084 

5.50615 

14 

0.01994 

0.00395 

5.05235 

15 

0.00026 

0.00015 

1.73915 

16 

0.00116 

0.00047 

2.48215 

17 

0.00192 

0.00031 

6.17770 

18 

0.00766 

0.00101 

7.57526 

19 

0.00546 

0.00263 

2.07236 

20 

0.00238 

0.00036 

6.69724 

21 

0.00028 

0.00053 

0.52986 

22 

0.00112 

0.00294 

0.38055 

23 

-0.00017 

0.00038 

-0.46052 

24 

0.00112 

0.00029 

3.84572 

25 

0.00116 

0.00066 

1.76433 

26 

0.00752 

0.00468 

1.60910 

27 

0.00052 

0.00044 

1.18942 

28 

0.00906 

0.00269 

3.36499 

29 

-0.00001 

0.00045 

-0.02107 

30 

-0.00046 

0.00056 

-0.82410 

31 

0.00049 

0.00072 

0.68124 

32 

-0.00086 

0.00031 

-2.76761 

33 

-0.00161 

0.00065 

-2.47247 

34 

0.00712 

0.00174 

4.09183 

35 

0.01303 

0.00378 

3.44932 

36 

0.01091 

0.00354 

3.08384 

37 

-0.00298 

0.00137 

-2.17524 

38 

0.00398 

0.00112 

3.55390 

39 

0.01109 

0.00440 

2.51879 

40 

0.00481 

0.00147 

3.27537 

41 

0.00463 

0.00175 

2.64318 

42 

0.00287 

0.00115 

2.48429 

43 

0.00444 

0.00132 

3.37090 

44 

0.00400 

0.00207 

1.92916 

45 

0.00695 

0.00193 

3.60106 


3. Conclusions 

The analysis here suggests that routine use of the normal 3PL model may not necessarily 
be wise. For the data under study, the score test suggests that many guessing parameters are 
not clearly different from 0 even if the normal 3PL model holds. In addition, the gain in data 


6 



description from use of a 3PL rather than a 2PL model appears small. Given the much greater 
computational difficulties associated with the 3PL model relative to those for the 2PL model, the 
question must be raised whether proponents of the 3PL model can demonstrate cases in which 
the gain from the 3PL model rather than the 2PL model is much larger than is observed here. 
The issue of guessing parameters not clearly positive is especially important from a computational 
perspective, for computations with the 3PL model are hardest when the guessing probabilities do 
not clearly differ from 0 (Hambleton et al., 1991, p. 44). 

An alternative approach to testing a 2PL versus a 3PL model would involve a likelihood-ratio 
chi-square test statistic such as 2nq[H{S^ n ) — H(S- 2 n)\', however, such a test involves two 
complications. The 3PL model must be fit, and, even if the null hypothesis holds and the sample 
size is large, the chi-square approximation is not satisfactory due to the requirement that the 
guessing parameters Cj be nonnegative. 
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